Corpus-based syntax-prosody tree matching

نویسنده

  • Dafydd Gibbon
چکیده

Empirical study of the syntax-prosody relation is hampered by the fact that current prosodic models are essentially linear, while syntactic structure is hierarchical. The present contribution describes a syntax-prosody comparison heuristic based on two new algorithms: Time Tree Induction, TTI, for building a prosodic treebank from time-annotated speech data, and Tree Similarity Indexing, TSI) for comparing syntactic trees with the prosodic trees. Two parametrisations of the TTI algorithm, for different tree branching conditions, are applied to sentences taken from a read-aloud narrative, and compared with parses of the same sentences, using the TSI. In addition, null-hypotheses in the form of flat bracketing of the sentences are compared. A preference for iambic (heavy rightmost branch) grouping is found. The resulting quantitative evidence for syntax-prosody relations has applications in speech genre characterisation and in duration models for speech synthesis. 1. Hierarchical syntax, linear prosody? The objective of this contribution is to provide a well-defined algorithmic approach to extracting complex prosodic information from speech corpora. Current empirical models of speech timing are based on a variety of algorithms, from single indices for timing patterns [1, 2] in psycholinguistics and phonetics to, in the speech synthesis domain, sum-of-products, CART and Bayesian classification approaches [3, 4], including models which use grammatical information. Campbell [5] has a model based on a strictly layered hierarchy, but in general duration models are linear, and hold for flat strings of words or syntactic categories. When syntagmatic grammatical information is used as a predictor for hierarchical structuring, in general the information used is also linear, based on strings of paradigmatic part-of-speech (POS) classes (grammatical categories) which provide weight factors for duration models. Wagner [6] uses a linear model for German speech synthesis based on five weighted POS sets. Grammatical categories imply at least local “hidden hierarchies”, of course. Rarely, explicit hierarchical approaches have been used [7], and detailed approaches to the partially hierarchical description of timing are once more becoming available [8], [9, 10, 11]; But there is currently no technique available for data-driven investigation of more complex hierarchical duration models for syntagmatic prosodic relations, and the issue is not addressed in recent authoritative literature [12]. Classification methods need 1Thanks to Grazyna Demenko, Katarzyna Dziubalska-Kołaczyk, Ekaterina Iassinskaia, Peter Ladkin, Zofia Malisz and lecture audiences in Dublin, Bielefeld, Poznań and Tübingen for discussion and to Ulrike Gut, Katrin Johanning, Sara Johannsen, Josef Raab, Alexandra Thies, Thorsten Trippel for contributing data. The software developed for this work is in the public domain (GPL). to include complex hierarchical timing information in addition to other to other phonetic and lexical properties of speech units. Further, it is a well-known phonostylistic effect that speech timing relations vary in highly complex ways depending on speech genre, including so-called fast speech phenomena [13]. Finally, other discourse factors such as focus and emotion are thought to affecting prosody, and thereby reducing the determining role of phrasal syntax, though these effects are currently not well understood (but see [14]). 2. Linear timing measures One set of approaches to investigating syntagmatic properties of timing is found in phonetic analyses of isochrony in syllable and foot timing. In [15], tone unit duration is divided by the number of feet in the tone unit, yielding average or “ideal” isochronous foot duration, and normalised deviation from mean foot length is measured. Neither hierarchy nor linear alternation of timing units figure in the approach, which may be said to use a Global Evenness (GE) criterion as a measure of the isochrony property, rather than the alternation property. Any arbitrary re-sorting of the relevant segments in an utterance (random, shortest-tolongest, etc.) would yield the same index. Timing fulfils the GE criterion, in some sense, but it has other properties too, so while the GE criterion for timing is a necessary criterion for isochrony it is (going beyond Roach’s stated goals, of course) not a sufficient criterion for an adequate timing model. Ramus, Nespor & Mehler [2] locate different languages in a timing space with the following parameters: , percentage of vocalic intervals relative to overall utterance length; , variance of consonantal intervals; , variance of vocalic intervals. The model also uses a variety of GE criterion: V stretches and C stretches would still yield the same results if randomly sorted (by length, longer consonant sequences first, etc.). Similar considerations apply to the measure, which reflects evenness of vowel sequence lengths, lower values tending to isochrony, and to the measure. The model does not have hierarchical and alternating timing components and is thus is incomplete as a model of rhythm timing, though it is claimed to be a model of rhythm. Cummins has pointed out [9] that the model makes a statement about the evenness of the phonotactics of the language, rather than timing. The model possibly reflects necessary conditions on timing, but falls short of providing a sufficient condition. Low, Grabe & Nolan [1] addressed the GE issue and developed the Pairwise Variability Index (PVI) in order to take iterative alternation into account. The PVI measures normalised differences between the durations of adjacent units (vowels, syllables, etc.):

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Restricted Domain Malay Speech Synthesizer Using Syntax-Prosody Representation

The speech synthesis approach required in restricted domain speech application is a synthesizer that has high quality like the speech output of ‘slot-filler’ approach but have at least the least flexibility of the ‘genuine’ speech synthesizer. Thus, in this research study, we propose an alternative approach of creating a speech synthesizer to be used in a restricted domain speech application. I...

متن کامل

Training prosodic phrasing rules for Chinese TTS systems

This paper describes several experiments designed to train prosodic phrasing models for Chinese TTS systems and to investigate the underlying rules that control Chinese prosody. First, we collected 559 sentences from news programs and built a large corpus for modeling Chinese prosody. Second, we selected 20 features and used classification and regression trees (CART) and transformational rule-b...

متن کامل

Where Should Pitch Accents and Phrase Breaks Go? A Syntax Tree Transducer Solution

Motivated by a desire to assess the prosody of foreign language learners, this study demonstrates the benefit of highlevel syntactic information in automatically deciding where phrase breaks and pitch accents should go in text. The connection between syntax and prosody is well-established, and naturally lends itself to tree-based probabilistic models. With automatically-derived parse trees pair...

متن کامل

Employing Sentence Structure: Syntax Trees as Prosody Generators

In this paper, we describe a prosody generation system for speech synthesis that makes direct use of syntax trees to obtain duration and pitch. Instead of transforming the tree through special rules or extracting isolated features from the tree, we make use of the tree structure itself to construct a superpositional model that is able to learn the relation between syntax and prosody. We impleme...

متن کامل

Tree grammars as models of prosodic structure

The common ToBI system of transcription assumes a sequential model of prosody. Many linguists argue for a tree structure explaining the synchronization and interaction among prosodic units. Could tree grammars, used previously in syntax-based language modeling, be used to model prosodic trees? We present a method of converting sequential transcripts into trees, and then demonstrate that modelin...

متن کامل

Determining Image Similarity from Pattern Matching of Abstract Syntax Trees of Tree Picture Grammars

This paper studies the use of tree edit distance for pattern matching of abstract syntax trees of images generated with tree picture grammars. This was done with a view to measuring its effectiveness in determining image similarity, when compared to current state of the art similarity measures used in Content Based Image Retrieval (CBIR). Eight computer based similarity measures were selected f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003